AITopics | chat gpt 4

Collaborating Authors

chat gpt 4

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ChatGPT passed the Turing Test. Now what?

ChatGPT passed the Turing Test. The AI fooled 73% of people into thinking it was human, raising new questions about machine intelligence. As artificial intelligence gets better and better, people face machines that look--and act--surprisingly human. Breakthroughs, discoveries, and DIY tips sent every weekday. It seems that every day brings a new headline about the burgeoning capabilities of large language models (LLMs) like ChatGPT and Google's Gemini--headlines that are either exciting or increasingly apocalyptic, depending on one's point of view. One particularly striking story arrived earlier this year: a paper that described how an LLM had passed the Turing Test, an experiment devised in the 1950s by computer science pioneer Alan Turing to determine whether machine intelligence could be distinguished from that of a human. The LLM in question was ChatGPT 4.5, and the paper found that it had been strikingly successful in fooling people into thinking it was human: In an experiment where participants were asked to choose whether the chatbot or an actual human was the real person, nearly three of the four chose the former.

chatgpt, intelligence, turing test, (14 more...)

Popular Science

Country:

North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Research Report (0.48)
Personal > Honors (0.46)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comparing Human and AI Rater Effects Using the Many-Facet Rasch Model

Jiao, Hong, Song, Dan, Lee, Won-Chan

arXiv.org Artificial IntelligenceMay-30-2025

Large language models (LLMs) have been widely explored for automated scoring in low-stakes assessment to facilitate learning and instruction. Empirical evidence related to which LLM produces the most reliable scores and induces least rater effects needs to be collected before the use of LLMs for automated scoring in practice. This study compared ten LLMs (ChatGPT 3.5, ChatGPT 4, ChatGPT 4o, OpenAI o1, Claude 3.5 Sonnet, Gemini 1.5, Gemini 1.5 Pro, Gemini 2.0, as well as DeepSeek V3, and DeepSeek R1) with human expert raters in scoring two types of writing tasks. The accuracy of the holistic and analytic scores from LLMs compared with human raters was evaluated in terms of Quadratic Weighted Kappa. Intra-rater consistency across prompts was compared in terms of Cronbach Alpha. Rater effects of LLMs were evaluated and compared with human raters using the Many-Facet Rasch model. The results in general supported the use of ChatGPT 4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet with high scoring accuracy, better rater reliability, and less rater effects.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.18486

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Education > Assessment & Standards (0.95)
Education > Educational Setting (0.68)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

ARC-NCA: Towards Developmental Solutions to the Abstraction and Reasoning Corpus

Guichard, Etienne, Reimers, Felix, Kvalsund, Mia, Lepperød, Mikkel, Nichele, Stefano

arXiv.org Artificial IntelligenceMay-14-2025

The Abstraction and Reasoning Corpus (ARC), later renamed ARC-AGI, poses a fundamental challenge in artificial general intelligence (AGI), requiring solutions that exhibit robust abstraction and reasoning capabilities across diverse tasks, while only few (with median count of three) correct examples are presented. While ARC-AGI remains very challenging for artificial intelligence systems, it is rather easy for humans. This paper introduces ARC-NCA, a developmental approach leveraging standard Neural Cellular Automata (NCA) and NCA enhanced with hidden memories (EngramNCA) to tackle the ARC-AGI benchmark. NCAs are employed for their inherent ability to simulate complex dynamics and emergent patterns, mimicking developmental processes observed in biological systems. Developmental solutions may offer a promising avenue for enhancing AI's problem-solving capabilities beyond mere training data extrapolation. ARC-NCA demonstrates how integrating developmental principles into computational models can foster adaptive reasoning and abstraction. We show that our ARC-NCA proof-of-concept results may be comparable to, and sometimes surpass, that of ChatGPT 4.5, at a fraction of the cost.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.08778

Country: Europe > Norway (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Neurosymbolic artificial intelligence via large language models and coherence-driven inference

Huntsman, Steve, Thomas, Jewell

arXiv.org Artificial IntelligenceFeb-19-2025

We devise an algorithm to generate sets of propositions that objectively instantiate graphs that support coherence-driven inference. We then benchmark the ability of large language models (LLMs) to reconstruct coherence graphs from (a straightforward transformation of) propositions expressed in natural language, with promising results from a single prompt to models optimized for reasoning. Combining coherence-driven inference with consistency evaluations by neural models may advance the state of the art in machine cognition.

coherence graph, graph, proposition, (11 more...)

arXiv.org Artificial Intelligence

2502.13953

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Divergence to Consensus: Evaluating the Role of Large Language Models in Facilitating Agreement through Adaptive Strategies

Triantafyllopoulos, Loukas, Kalles, Dimitris

arXiv.org Artificial IntelligenceFeb-3-2025

Achieving consensus in group decision-making often involves overcoming significant challenges, particularly in reconciling diverse perspectives and mitigating biases that hinder agreement. Traditional methods relying on human facilitators are often constrained by scalability and efficiency, especially in large-scale, fast-paced discussions. To address these challenges, this study proposes a novel framework employing large language models (LLMs) as automated facilitators within a custom-built multi-user chat system. Leveraging cosine similarity as a core metric, this approach evaluates the ability of three state-of-the-art LLMs- ChatGPT 4.0, Mistral Large 2, and AI21 Jamba Instruct- to synthesize consensus proposals that align with participants' viewpoints. Unlike conventional techniques, the system integrates adaptive facilitation strategies, including clarifying misunderstandings, summarizing discussions, and proposing compromises, enabling the LLMs to iteratively refine consensus proposals based on user feedback. Experimental results demonstrate the superiority of ChatGPT 4.0, which achieves higher alignment with participant opinions, requiring fewer iterations to reach consensus compared to its counterparts. Moreover, analysis reveals the nuanced performance of the models across various sustainability-focused discussion topics, such as climate action, quality education, good health and well-being, and access to clean water and sanitation. These findings highlight the transformative potential of LLM-driven facilitation for improving collective decision-making processes and underscore the importance of advancing evaluation metrics and cross-cultural adaptability in future research.

consensus, llm, participant, (12 more...)

arXiv.org Artificial Intelligence

2503.15521

Country:

Europe > Ukraine (0.14)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
Europe > Switzerland (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Consumer Health (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advice for Diabetes Self-Management by ChatGPT Models: Challenges and Recommendations

Hussain, Waqar, Grundy, John

arXiv.org Artificial IntelligenceJan-14-2025

Given their ability for advanced reasoning, extensive contextual understanding, and robust question-answering abilities, large language models have become prominent in healthcare management research. Despite adeptly handling a broad spectrum of healthcare inquiries, these models face significant challenges in delivering accurate and practical advice for chronic conditions such as diabetes. We evaluate the responses of ChatGPT versions 3.5 and 4 to diabetes patient queries, assessing their depth of medical knowledge and their capacity to deliver personalized, context-specific advice for diabetes self-management. Our findings reveal discrepancies in accuracy and embedded biases, emphasizing the models' limitations in providing tailored advice unless activated by sophisticated prompting techniques. Additionally, we observe that both models often provide advice without seeking necessary clarification, a practice that can result in potentially dangerous advice. This underscores the limited practical effectiveness of these models without human oversight in clinical settings. To address these issues, we propose a commonsense evaluation layer for prompt evaluation and incorporating disease-specific external memory using an advanced Retrieval Augmented Generation technique. This approach aims to improve information quality and reduce misinformation risks, contributing to more reliable AI applications in healthcare settings. Our findings seek to influence the future direction of AI in healthcare, enhancing both the scope and quality of its integration.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.07931

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluation of the Code Generation Capabilities of ChatGPT 4: A Comparative Analysis in 19 Programming Languages

Gilbert, L. C.

arXiv.org Artificial IntelligenceJan-4-2025

This bachelor's thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages. The study analyzed solution rates across three difficulty levels, types of errors encountered, and code quality in terms of runtime and memory efficiency through a quantitative experiment. A total of 188 programming problems were selected from the LeetCode platform, and ChatGPT 4 was given three attempts to produce a correct solution with feedback. ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased. Notably, the model faced considerable challenges with hard problems across all languages. ChatGPT 4 demonstrated higher competence in widely used languages, likely due to a larger volume and higher quality of training data. The solution rates also revealed a preference for languages with low abstraction levels and static typing. For popular languages, the most frequent error was "Wrong Answer," whereas for less popular languages, compiler and runtime errors prevailed, suggesting frequent misunderstandings and confusion regarding the structural characteristics of these languages. The model exhibited above-average runtime efficiency in all programming languages, showing a tendency toward statically typed and low-abstraction languages. Memory efficiency results varied significantly, with above-average performance in 14 languages and below-average performance in five languages. A slight preference for low-abstraction languages and a leaning toward dynamically typed languages in terms of memory efficiency were observed. Future research should include a larger number of tasks, iterations, and less popular languages. Additionally, ChatGPT 4's abilities in code interpretation and summarization, debugging, and the development of complex, practical code could be analyzed further. ---- Diese Bachelorarbeit untersucht die F\"ahigkeiten von ChatGPT 4 zur Code-Generierung in 19 Programmiersprachen. Betrachtet wurden die L\"osungsraten zwischen drei Schwierigkeitsgraden, die aufgetretenen Fehlerarten und die Qualit\"at des Codes hinsichtlich der Laufzeit- und Speichereffizienz in einem quantitativen Experiment. Dabei wurden 188 Programmierprobleme der Plattform LeetCode entnommen, wobei ChatGPT 4 jeweils drei Versuche hatte, mittels Feedback eine korrekte L\"osung zu generieren. ChatGPT 4 l\"oste 39,67 % aller Aufgaben erfolgreich, wobei die Erfolgsrate mit zunehmendem Schwierigkeitsgrad deutlich abnahm und bei komplexen Problemen in allen Sprachen signifikante Schwierigkeiten auftraten. Das Modell zeigte eine h\"ohere Kompetenz in weit verbreiteten Sprachen, was wahrscheinlich auf eine gr\"o{\ss}ere Menge und h\"ohere Qualit\"at der Trainingsdaten zur\"uckzuf\"uhren ist. Bez\"uglich der L\"osungsraten zeigte das Modell zudem eine Pr\"aferenz f\"ur Sprachen mit niedrigem Abstraktionsniveau und statischer Typisierung. Bei Sprachen hoher Popularit\"at trat der Fehler Wrong Answer am h\"aufigsten auf, w\"ahrend bei weniger popul\"aren Sprachen Compiler- und Laufzeitfehler \"uberwogen, was auf h\"aufige Missverst\"andnisse und Verwechslungen bez\"uglich der spezifischen strukturellen Eigenschaften dieser Sprachen zur\"uckzuf\"uhren ist. ChatGPT 4 demonstrierte in allen Programmiersprachen eine \"uberdurchschnittliche Laufzeiteffizienz und tendierte diesbez\"uglich erneut zu statisch typisierten und niedrig abstrahierten Sprachen. Die Werte zur Speichereffizienz variierten erheblich, wobei in 14 Sprachen \"uberdurchschnittliche und in f\"unf Sprachen unterdurchschnittliche Werte erzielt wurden. Es zeigte sich diesbez\"uglich eine leichte Tendenz zugunsten von niedrig abstrahierten sowie eine Pr\"aferenz zu dynamisch typisierten Sprachen. Zuk\"unftige Forschung sollte eine h\"ohere Anzahl an Aufgaben, Iterationen und unpopul\"aren Sprachen einbeziehen. Dar\"uber hinaus k\"onnten die F\"ahigkeiten von ChatGPT 4 in der Code-Interpretation und -Zusammenfassung, im Debugging und in der Entwicklung komplexer, praxisbezogener Codes analysiert werden.

chat gpt 4, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.14599318

2501.02338

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Emotion-Aware Response Generation Using Affect-Enriched Embeddings with LLMs

Rasool, Abdur, Shahzad, Muhammad Irfan, Aslam, Hafsa, Chan, Vincent

arXiv.org Artificial IntelligenceOct-2-2024

There is a need for empathetic and coherent responses in automated chatbot-facilitated psychotherapy sessions. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce a novel framework that integrates multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as LLAMA 2, Flan-T5, ChatGPT 3.0, and ChatGPT 4.0. The primary dataset comprises over 2,000 therapy session transcripts from the Counseling and Psychotherapy database, covering discussions on anxiety, depression, trauma, and addiction. We segment the transcripts into smaller chunks, enhancing them with lexical features and computing embeddings using BERT, GPT-3, and RoBERTa to capture semantic and emotional nuances. These embeddings are stored in a FAISS vector database, enabling efficient similarity search and clustering based on cosine similarity. Upon user query, the most relevant segments are retrieved and provided as context to the LLMs, significantly improving the models' ability to generate empathetic and contextually appropriate responses. Experimental evaluations demonstrate that in-corporating emotion lexicons enhances empathy, coherence, informativeness, and fluency scores. Our findings highlight the critical role of emotional embeddings in improving LLM performance for psychotherapy.

chat gpt 3, coherence, lexicon, (14 more...)

arXiv.org Artificial Intelligence

2410.01306

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Wang, Xuwu, Cui, Qiwen, Tao, Yunzhe, Wang, Yiran, Chai, Ziwei, Han, Xiaotian, Liu, Boyi, Yuan, Jianbo, Su, Jing, Wang, Guoyin, Liu, Tingkai, Chen, Liyu, Liu, Tianyi, Sun, Tao, Zhang, Yufeng, Zheng, Sirui, You, Quanzeng, Yang, Yang, Yang, Hongxia

arXiv.org Artificial IntelligenceOct-1-2024

Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse data handling scenarios. In response, we introduce BabelBench, an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on. Besides the basic capabilities of multimodal understanding, structured data processing as well as code generation, these tasks demand advanced capabilities in exploration, planning, reasoning and debugging. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement. The insights derived from our comprehensive analysis offer valuable guidance for future research within the community. The benchmark data can be found at https://github.com/FFD8FFE/babelbench.

benchmark, reasoning, zhang, (16 more...)

arXiv.org Artificial Intelligence

2410.00773

Genre: Research Report (0.84)

Industry:

Information Technology > Software (0.74)
Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Natural Language Interaction with a Household Electricity Knowledge-based Digital Twin

Fortuna, Carolina, Hanžel, Vid, Bertalanič, Blaž

arXiv.org Artificial IntelligenceJul-11-2024

Domain specific digital twins, representing a digital replica of various segments of the smart grid, are foreseen as able to model, simulate, and control the respective segments. At the same time, knowledge-based digital twins, coupled with AI, may also empower humans to understand aspects of the system through natural language interaction in view of planning and policy making. This paper is the first to assess and report on the potential of Retrieval Augmented Generation (RAG) question answers related to household electrical energy measurement aspects leveraging a knowledge-based energy digital twin. Relying on the recently published electricity consumption knowledge graph that actually represents a knowledge-based digital twin, we study the capabilities of ChatGPT, Gemini and Llama in answering electricity related questions. Furthermore, we compare the answers with the ones generated through a RAG techniques that leverages an existing electricity knowledge-based digital twin. Our findings illustrate that the RAG approach not only reduces the incidence of incorrect information typically generated by LLMs but also significantly improves the quality of the output by grounding responses in verifiable data. This paper details our methodology, presents a comparative analysis of responses with and without RAG, and discusses the implications of our findings for future applications of AI in specialized sectors like energy data analysis.

chat gpt 4, consumption, dataset, (14 more...)

arXiv.org Artificial Intelligence

2406.06566

Country:

Europe > United Kingdom (0.52)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > Canada (0.04)
(3 more...)

Genre: Research Report > New Finding (0.54)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback